Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Paediatr Perinat Epidemiol ; 37(4): 266-275, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36938831

RESUMEN

BACKGROUND: Linked datasets that enable longitudinal assessments are scarce in low and middle-income countries. OBJECTIVES: We aimed to assess the linkage of administrative databases of live births and under-five child deaths to explore mortality and trends for preterm, small (SGA) and large for gestational age (LGA) in Mexico. METHODS: We linked individual-level datasets collected by National statistics from 2008 to 2019. Linkage was performed based on agreement on birthday, sex, residential address. We used the Centre for Data and Knowledge Integration for Health software to identify the best candidate pairs based on similarity. Accuracy was assessed by calculating the area under the receiver operating characteristic curve. We evaluated completeness by comparing the number of linked records with reported deaths. We described the percentage of linked records by baseline characteristics to identify potential bias. Using the linked dataset, we calculated mortality rate ratios (RR) in neonatal, infants, and children under-five according to gestational age, birthweight, and size. RESULTS: For the period 2008-2019, a total of 24,955,172 live births and 321,165 under-five deaths were available for linkage. We excluded 1,539,046 records (6.2%) with missing or implausible values. We succesfully linked 231,765 deaths (72.2%: range 57.1% in 2009 and 84.3% in 2011). The rate of neonatal mortality was higher for preterm compared with term (RR 3.83, 95% confidence interval, [CI] 3.78, 3.88) and for SGA compared with appropriate for gestational age (AGA) (RR 1.22 95% CI, 1.19, 1.24). Births at <28 weeks had the highest mortality (RR 35.92, 95% CI, 34.97, 36.88). LGA had no additional risk vs AGA among children under five (RR 0.92, 95% CI, 0.90, 0.93). CONCLUSIONS: We demonstrated the utility of linked data to understand neonatal vulnerability and child mortality. We created a linked dataset that would be a valuable resource for future population-based research.


Asunto(s)
Mortalidad Infantil , Nacimiento Vivo , Lactante , Embarazo , Femenino , Niño , Recién Nacido , Humanos , Nacimiento Vivo/epidemiología , México/epidemiología , Peso al Nacer , Aumento de Peso , Almacenamiento y Recuperación de la Información
2.
PeerJ ; 10: e13507, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35846888

RESUMEN

Background: Public health research frequently requires the integration of information from different data sources. However, errors in the records and the high computational costs involved make linking large administrative databases using record linkage (RL) methodologies a major challenge. Methods: We present Tucuxi-BLAST, a versatile tool for probabilistic RL that utilizes a DNA-encoded approach to encrypt, analyze and link massive administrative databases. Tucuxi-BLAST encodes the identification records into DNA. BLASTn algorithm is then used to align the sequences between databases. We tested and benchmarked on a simulated database containing records for 300 million individuals and also on four large administrative databases containing real data on Brazilian patients. Results: Our method was able to overcome misspellings and typographical errors in administrative databases. In processing the RL of the largest simulated dataset (200k records), the state-of-the-art method took 5 days and 7 h to perform the RL, while Tucuxi-BLAST only took 23 h. When compared with five existing RL tools applied to a gold-standard dataset from real health-related databases, Tucuxi-BLAST had the highest accuracy and speed. By repurposing genomic tools, Tucuxi-BLAST can improve data-driven medical research and provide a fast and accurate way to link individual information across several administrative databases.


Asunto(s)
Investigación Biomédica , Registro Médico Coordinado , Humanos , Registro Médico Coordinado/métodos , Bases de Datos Factuales , Brasil , Salud Pública
4.
BMC Med Inform Decis Mak ; 20(1): 289, 2020 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-33167998

RESUMEN

BACKGROUND: Record linkage is the process of identifying and combining records about the same individual from two or more different datasets. While there are many open source and commercial data linkage tools, the volume and complexity of currently available datasets for linkage pose a huge challenge; hence, designing an efficient linkage tool with reasonable accuracy and scalability is required. METHODS: We developed CIDACS-RL (Centre for Data and Knowledge Integration for Health - Record Linkage), a novel iterative deterministic record linkage algorithm based on a combination of indexing search and scoring algorithms (provided by Apache Lucene). We described how the algorithm works and compared its performance with four open source linkage tools (AtyImo, Febrl, FRIL and RecLink) in terms of sensitivity and positive predictive value using gold standard dataset. We also evaluated its accuracy and scalability using a case-study and its scalability and execution time using a simulated cohort in serial (single core) and multi-core (eight core) computation settings. RESULTS: Overall, CIDACS-RL algorithm had a superior performance: positive predictive value (99.93% versus AtyImo 99.30%, RecLink 99.5%, Febrl 98.86%, and FRIL 96.17%) and sensitivity (99.87% versus AtyImo 98.91%, RecLink 73.75%, Febrl 90.58%, and FRIL 74.66%). In the case study, using a ROC curve to choose the most appropriate cut-off value (0.896), the obtained metrics were: sensitivity = 92.5% (95% CI 92.07-92.99), specificity = 93.5% (95% CI 93.08-93.8) and area under the curve (AUC) = 97% (95% CI 96.97-97.35). The multi-core computation was about four times faster (150 seconds) than the serial setting (550 seconds) when using a dataset of 20 million records. CONCLUSION: CIDACS-RL algorithm is an innovative linkage tool for huge datasets, with higher accuracy, improved scalability, and substantially shorter execution time compared to other existing linkage tools. In addition, CIDACS-RL can be deployed on standard computers without the need for high-speed processors and distributed infrastructures.


Asunto(s)
Conjuntos de Datos como Asunto , Almacenamiento y Recuperación de la Información , Registro Médico Coordinado , Algoritmos , Estudios de Cohortes , Humanos , Sistemas de Registros Médicos Computarizados
5.
BMC Med Inform Decis Mak ; 20(1): 173, 2020 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-32711532

RESUMEN

BACKGROUND: Research using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information. The objective of this study is to present an approach to prepare and link data from administrative sources in a middle-income country, to estimate its quality and to identify potential sources of bias by comparing linked and non-linked individuals. METHODS: We linked two administrative datasets with data covering the period 2001 to 2015, using maternal attributes (name, age, date of birth, and municipally of residence) from Brazil: live birth information system and the 100 Million Brazilian Cohort (created using administrative records from over 114 million individuals whose families applied for social assistance via the Unified Register for Social Programmes) implementing an in house developed linkage tool CIDACS-RL. We then estimated the proportion of highly probably link and examined the characteristics of missed-matches to identify any potential source of bias. RESULTS: A total of 27,699,891 live births were submited to linkage with maternal information recorded in the baseline of the 100 Million Brazilian Cohort dataset of those, 16,447,414 (59.4%) children were found registered in the 100 Million Brazilian Cohort dataset. The proportion of highly probably link ranged from 39.3% in 2001 to 82.1% in 2014. A substantial improvement in the linkage after the introduction of maternal date of birth attribute, in 2011, was observed. Our analyses indicated a slightly higher proportion of missing data among missed matches and a higher proportion of people living in an urban area and self-declared as Caucasian among linked pairs when compared with non-linked sets. DISCUSSION: We demonstrated that CIDACS-RL is capable of performing high quality linkage even with a limited number of common attributes, using indexation as a blocking strategy in larg e routine databases from a middle-income country. However, residual records occurred more among people under worse living conditions. The results presented in this study reinforce the need of evaluating linkage quality and when necessary to take linkage error into account for the analyses of any generated dataset.


Asunto(s)
Bases de Datos Factuales , Parto , Brasil , Estudios de Cohortes , Femenino , Humanos , Masculino , Registro Médico Coordinado , Embarazo
7.
Front Pharmacol ; 10: 984, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31607900

RESUMEN

Health technology assessment (HTA) is the systematic evaluation of the properties and impacts of health technologies and interventions. In this article, we presented a discussion of HTA and its evolution in Brazil, as well as a description of secondary data sources available in Brazil with potential applications to generate evidence for HTA and policy decisions. Furthermore, we highlighted record linkage, ongoing record linkage initiatives in Brazil, and the main linkage tools developed and/or used in Brazilian data. Finally, we discussed the challenges and opportunities of using secondary data for research in the Brazilian context. In conclusion, we emphasized the availability of high quality data and an open, modern attitude toward the use of data for research and policy. This is supported by a rigorous but enabling legal framework that will allow the conduct of large-scale observational studies to evaluate clinical, economical, and social impacts of health technologies and social policies.

8.
IEEE J Biomed Health Inform ; 22(2): 346-353, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29505402

RESUMEN

Data linkage refers to the process of identifying and linking records that refer to the same entity across multiple heterogeneous data sources. This method has been widely utilized across scientific domains, including public health where records from clinical, administrative, and other surveillance databases are aggregated and used for research, decision making, and assessment of public policies. When a common set of unique identifiers does not exist across sources, probabilistic linkage approaches are used to link records using a combination of attributes. These methods require a careful choice of comparison attributes as well as similarity metrics and cutoff values to decide if a given pair of records matches or not and for assessing the accuracy of the results. In large, complex datasets, linking and assessing accuracy can be challenging due to the volume and complexity of the data, the absence of a gold standard, and the challenges associated with manually reviewing a very large number of record matches. In this paper, we present AtyImo, a hybrid probabilistic linkage tool optimized for high accuracy and scalability in massive data sets. We describe the implementation details around anonymization, blocking, deterministic and probabilistic linkage, and accuracy assessment. We present results from linking a large population-based cohort of 114 million individuals in Brazil to public health and administrative databases for research. In controlled and real scenarios, we observed high accuracy of results: 93%-97% true matches. In terms of scalability, we present AtyImo's ability to link the entire cohort in less than nine days using Spark and scaling up to 20 million records in less than 12s over heterogeneous (CPU+GPU) architectures.


Asunto(s)
Bases de Datos Factuales , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Brasil , Estudios de Cohortes , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...